{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tied States Hidden Markov Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "authors:<br>\n",
    "Jacob Schreiber [<a href=\"sendto:jmchreiber91@gmail.com\">jmchreiber91@gmail.com</a>],<br>\n",
    "Nicholas Farn [<a href=\"sendto:nicholasfarn@gmail.com\">nicholasfarn@gmail.com</a>]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An example of using tied states to represent the same distribution across multiple states. This example is a toy example derived from biology, where we will look at DNA sequences.\n",
    "\n",
    "The fake structure we will pretend exists is:\n",
    "```\n",
    "start -> background -> CG island -> background -> poly-T region\n",
    "```\n",
    "DNA is comprised of four nucleotides, A, C, G, and T. Lets say that in the background sequence, all of these occur at the same frequency. In the CG island, the nucleotides C and G occur more frequently. In the poly T region, T occurs most frequently.\n",
    "\n",
    "We need the graph structure, because we fake know that the sequence must return to the background distribution between the CG island and the poly-T region. However, we also fake know that both background distributions need to be the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pomegranate import *\n",
    "import random\n",
    "import numpy as np\n",
    "\n",
    "random.seed(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets start off with an example without tied states and see what happens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "untiedmodel = HiddenMarkovModel( \"No Tied States\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we'll define the four states."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "background_one = State( DiscreteDistribution({'A': 0.25, 'C':0.25, 'G': 0.25, 'T':0.25 }), name=\"B1\" )\n",
    "CG_island = State( DiscreteDistribution({'A': 0.1, 'C':0.4, 'G': 0.4, 'T':0.1 }), name=\"CG\" )\n",
    "background_two = State( DiscreteDistribution({'A': 0.25, 'C':0.25, 'G': 0.25, 'T':0.25 }), name=\"B2\" )\n",
    "poly_T = State( DiscreteDistribution({'A': 0.1, 'C':0.1, 'G': 0.1, 'T':0.7 }), name=\"PT\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then add the starting transitions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "untiedmodel.add_transition( untiedmodel.start, background_one, 1. )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The transition matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "untiedmodel.add_transition( background_one, background_one, 0.9 )\n",
    "untiedmodel.add_transition( background_one, CG_island, 0.1 )\n",
    "untiedmodel.add_transition( CG_island, CG_island, 0.8 )\n",
    "untiedmodel.add_transition( CG_island, background_two, 0.2 )\n",
    "untiedmodel.add_transition( background_two, background_two, 0.8 )\n",
    "untiedmodel.add_transition( background_two, poly_T, 0.2 )\n",
    "untiedmodel.add_transition( poly_T, poly_T, 0.7 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And finally the ending transitions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "untiedmodel.add_transition( poly_T, untiedmodel.end, 0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finishing with the method \"bake\" to finalize the structure of our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "untiedmodel.bake( verbose=True )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's define the following sequences. Keep in mind training must by done on a list of lists, not on a string in order to allow strings of any length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "sequences = [ numpy.array(list(\"TAGCACATCGCAGCGCATCACGCGCGCTAGCATATAAGCACGATCAGCACGACTGTTTTT\")),\n",
    "\t      numpy.array(list(\"TAGAATCGCTACATAGACGCGCGCTCGCCGCGCTCGATAAGCTACGAACACGATTTTTTA\")),\n",
    "\t      numpy.array(list(\"GATAGCTACGACTACGCGACTCACGCGCGCGCTCCGCATCAGACACGAATATAGATAAGATATTTTTT\")) ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets check our distributions before training our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "B1: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.25,\n",
      "            \"C\" :0.25,\n",
      "            \"G\" :0.25,\n",
      "            \"T\" :0.25\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "B2: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.25,\n",
      "            \"C\" :0.25,\n",
      "            \"G\" :0.25,\n",
      "            \"T\" :0.25\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "CG: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.1,\n",
      "            \"C\" :0.4,\n",
      "            \"G\" :0.4,\n",
      "            \"T\" :0.1\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "PT: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.1,\n",
      "            \"C\" :0.1,\n",
      "            \"G\" :0.1,\n",
      "            \"T\" :0.7\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "result = \"\\n\".join( \"{}: {}\".format( state.name, state.distribution )\n",
    "\tfor state in untiedmodel.states if not state.is_silent() )\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets train our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "    \"class\" : \"HiddenMarkovModel\",\n",
       "    \"name\" : \"No Tied States\",\n",
       "    \"start\" : {\n",
       "        \"class\" : \"State\",\n",
       "        \"distribution\" : null,\n",
       "        \"name\" : \"No Tied States-start\",\n",
       "        \"weight\" : 1.0\n",
       "    },\n",
       "    \"end\" : {\n",
       "        \"class\" : \"State\",\n",
       "        \"distribution\" : null,\n",
       "        \"name\" : \"No Tied States-end\",\n",
       "        \"weight\" : 1.0\n",
       "    },\n",
       "    \"states\" : [\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.31639790507943677,\n",
       "                        \"C\" : 0.2920431428628597,\n",
       "                        \"G\" : 0.21191370125516387,\n",
       "                        \"T\" : 0.1796452508025395\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"B1\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.4178729606164158,\n",
       "                        \"C\" : 0.22068400964255355,\n",
       "                        \"G\" : 0.19558677075666792,\n",
       "                        \"T\" : 0.16585625898436263\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"B2\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.061102410527201965,\n",
       "                        \"C\" : 0.5058121809386029,\n",
       "                        \"G\" : 0.3384433263771247,\n",
       "                        \"T\" : 0.09464208215707043\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"CG\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.09055188417721106,\n",
       "                        \"C\" : 4.501540436607585e-09,\n",
       "                        \"G\" : 0.01138728389704469,\n",
       "                        \"T\" : 0.8980608274242039\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"PT\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : null,\n",
       "            \"name\" : \"No Tied States-start\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : null,\n",
       "            \"name\" : \"No Tied States-end\",\n",
       "            \"weight\" : 1.0\n",
       "        }\n",
       "    ],\n",
       "    \"end_index\" : 5,\n",
       "    \"start_index\" : 4,\n",
       "    \"silent_index\" : 4,\n",
       "    \"edges\" : [\n",
       "        [\n",
       "            4,\n",
       "            0,\n",
       "            1.0,\n",
       "            1.0,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            0,\n",
       "            0,\n",
       "            0.9489674911696251,\n",
       "            0.9,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            0,\n",
       "            2,\n",
       "            0.05103250883037496,\n",
       "            0.1,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            2,\n",
       "            2,\n",
       "            0.9256192712029668,\n",
       "            0.8,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            2,\n",
       "            1,\n",
       "            0.07438072879703318,\n",
       "            0.2,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            1,\n",
       "            1,\n",
       "            0.9570959686922387,\n",
       "            0.8,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            1,\n",
       "            3,\n",
       "            0.04290403130776134,\n",
       "            0.2,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            3,\n",
       "            3,\n",
       "            0.8417505801851788,\n",
       "            0.7,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            3,\n",
       "            5,\n",
       "            0.15824941981482132,\n",
       "            0.3,\n",
       "            null\n",
       "        ]\n",
       "    ],\n",
       "    \"distribution ties\" : []\n",
       "}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "untiedmodel.fit( sequences, stop_threshold=0.01 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And check our new distributions after training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "B1: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.31639790507943677,\n",
      "            \"C\" :0.2920431428628597,\n",
      "            \"G\" :0.21191370125516387,\n",
      "            \"T\" :0.1796452508025395\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "B2: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.4178729606164158,\n",
      "            \"C\" :0.22068400964255355,\n",
      "            \"G\" :0.19558677075666792,\n",
      "            \"T\" :0.16585625898436263\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "CG: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.061102410527201965,\n",
      "            \"C\" :0.5058121809386029,\n",
      "            \"G\" :0.3384433263771247,\n",
      "            \"T\" :0.09464208215707043\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "PT: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.09055188417721106,\n",
      "            \"C\" :4.501540436607585e-09,\n",
      "            \"G\" :0.01138728389704469,\n",
      "            \"T\" :0.8980608274242039\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "result = \"\\n\".join( \"{}: {}\".format( state.name, state.distribution )\n",
    "\tfor state in untiedmodel.states if not state.is_silent() )\n",
    "    \n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can try our example with tied states."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "tiedmodel = HiddenMarkovModel( \"Tied States\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets redefine the four states."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "background = DiscreteDistribution({'A': 0.25, 'C':0.25, 'G': 0.25, 'T':0.25 })\n",
    "\n",
    "background_one = State( background, name=\"B1\" )\n",
    "CG_island = State( DiscreteDistribution({'A': 0.1, \n",
    "\t'C':0.4, 'G': 0.4, 'T':0.1 }), name=\"CG\" )\n",
    "background_two = State( background, name=\"B2\" )\n",
    "poly_T = State( DiscreteDistribution({'A': 0.1, \n",
    "\t'C':0.1, 'G': 0.1, 'T':0.7 }), name=\"PT\" )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then add the starting transitions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "tiedmodel.add_transition( tiedmodel.start, background_one, 1. );"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then the tranisiton matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "tiedmodel.add_transition( background_one, background_one, 0.9 )\n",
    "tiedmodel.add_transition( background_one, CG_island, 0.1 )\n",
    "tiedmodel.add_transition( CG_island, CG_island, 0.8 )\n",
    "tiedmodel.add_transition( CG_island, background_two, 0.2 )\n",
    "tiedmodel.add_transition( background_two, background_two, 0.8 )\n",
    "tiedmodel.add_transition( background_two, poly_T, 0.2 )\n",
    "tiedmodel.add_transition( poly_T, poly_T, 0.7 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally adding the ending transitions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "tiedmodel.add_transition( poly_T, tiedmodel.end, 0.3 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We \"bake\" the model to finalize its structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "tiedmodel.bake( verbose=True )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Now let's use the following sequences to train our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "sequences = [ numpy.array(list(\"TAGCACATCGCAGCGCATCACGCGCGCTAGCATATAAGCACGATCAGCACGACTGTTTTT\")),\n",
    "\t\t\t  numpy.array(list(\"TAGAATCGCTACATAGACGCGCGCTCGCCGCGCTCGATAAGCTACGAACACGATTTTTTA\")),\n",
    "\t\t\t  numpy.array(list(\"GATAGCTACGACTACGCGACTCACGCGCGCGCTCCGCATCAGACACGAATATAGATAAGATATTTTTT\")) ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But before that let's check the distributions in our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "B1: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.25,\n",
      "            \"C\" :0.25,\n",
      "            \"G\" :0.25,\n",
      "            \"T\" :0.25\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "B2: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.25,\n",
      "            \"C\" :0.25,\n",
      "            \"G\" :0.25,\n",
      "            \"T\" :0.25\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "CG: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.1,\n",
      "            \"C\" :0.4,\n",
      "            \"G\" :0.4,\n",
      "            \"T\" :0.1\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "PT: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.1,\n",
      "            \"C\" :0.1,\n",
      "            \"G\" :0.1,\n",
      "            \"T\" :0.7\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "result = \"\\n\".join( \"{}: {}\".format( state.name, state.distribution ) \n",
    "\tfor state in tiedmodel.states if not state.is_silent() )\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's train our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\n",
       "    \"class\" : \"HiddenMarkovModel\",\n",
       "    \"name\" : \"Tied States\",\n",
       "    \"start\" : {\n",
       "        \"class\" : \"State\",\n",
       "        \"distribution\" : null,\n",
       "        \"name\" : \"Tied States-start\",\n",
       "        \"weight\" : 1.0\n",
       "    },\n",
       "    \"end\" : {\n",
       "        \"class\" : \"State\",\n",
       "        \"distribution\" : null,\n",
       "        \"name\" : \"Tied States-end\",\n",
       "        \"weight\" : 1.0\n",
       "    },\n",
       "    \"states\" : [\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.3825505694630474,\n",
       "                        \"C\" : 0.23803901803476255,\n",
       "                        \"G\" : 0.20043821449422436,\n",
       "                        \"T\" : 0.1789721980079656\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"B1\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.3825505694630474,\n",
       "                        \"C\" : 0.23803901803476255,\n",
       "                        \"G\" : 0.20043821449422436,\n",
       "                        \"T\" : 0.1789721980079656\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"B2\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.10879783006206004,\n",
       "                        \"C\" : 0.4779841622676619,\n",
       "                        \"G\" : 0.3129317572182933,\n",
       "                        \"T\" : 0.10028625045198475\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"CG\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : {\n",
       "                \"class\" : \"Distribution\",\n",
       "                \"dtype\" : \"str\",\n",
       "                \"name\" : \"DiscreteDistribution\",\n",
       "                \"parameters\" : [\n",
       "                    {\n",
       "                        \"A\" : 0.09294944999091662,\n",
       "                        \"C\" : 1.4280989721539988e-17,\n",
       "                        \"G\" : 0.0060865515333644584,\n",
       "                        \"T\" : 0.9009639984757188\n",
       "                    }\n",
       "                ],\n",
       "                \"frozen\" : false\n",
       "            },\n",
       "            \"name\" : \"PT\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : null,\n",
       "            \"name\" : \"Tied States-start\",\n",
       "            \"weight\" : 1.0\n",
       "        },\n",
       "        {\n",
       "            \"class\" : \"State\",\n",
       "            \"distribution\" : null,\n",
       "            \"name\" : \"Tied States-end\",\n",
       "            \"weight\" : 1.0\n",
       "        }\n",
       "    ],\n",
       "    \"end_index\" : 5,\n",
       "    \"start_index\" : 4,\n",
       "    \"silent_index\" : 4,\n",
       "    \"edges\" : [\n",
       "        [\n",
       "            4,\n",
       "            0,\n",
       "            1.0,\n",
       "            1.0,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            0,\n",
       "            0,\n",
       "            0.9331385232071054,\n",
       "            0.9,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            0,\n",
       "            2,\n",
       "            0.06686147679289454,\n",
       "            0.1,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            2,\n",
       "            2,\n",
       "            0.9433476039763057,\n",
       "            0.8,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            2,\n",
       "            1,\n",
       "            0.05665239602369436,\n",
       "            0.2,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            1,\n",
       "            1,\n",
       "            0.9580129624347591,\n",
       "            0.8,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            1,\n",
       "            3,\n",
       "            0.04198703756524091,\n",
       "            0.2,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            3,\n",
       "            3,\n",
       "            0.8397947514093073,\n",
       "            0.7,\n",
       "            null\n",
       "        ],\n",
       "        [\n",
       "            3,\n",
       "            5,\n",
       "            0.16020524859069266,\n",
       "            0.3,\n",
       "            null\n",
       "        ]\n",
       "    ],\n",
       "    \"distribution ties\" : [\n",
       "        [\n",
       "            0,\n",
       "            1\n",
       "        ],\n",
       "        [\n",
       "            1,\n",
       "            0\n",
       "        ]\n",
       "    ]\n",
       "}"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tiedmodel.fit( sequences, stop_threshold=0.01 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's check our new distributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "B1: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.3825505694630474,\n",
      "            \"C\" :0.23803901803476255,\n",
      "            \"G\" :0.20043821449422436,\n",
      "            \"T\" :0.1789721980079656\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "B2: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.3825505694630474,\n",
      "            \"C\" :0.23803901803476255,\n",
      "            \"G\" :0.20043821449422436,\n",
      "            \"T\" :0.1789721980079656\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "CG: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.10879783006206004,\n",
      "            \"C\" :0.4779841622676619,\n",
      "            \"G\" :0.3129317572182933,\n",
      "            \"T\" :0.10028625045198475\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n",
      "PT: {\n",
      "    \"class\" :\"Distribution\",\n",
      "    \"dtype\" :\"str\",\n",
      "    \"name\" :\"DiscreteDistribution\",\n",
      "    \"parameters\" :[\n",
      "        {\n",
      "            \"A\" :0.09294944999091662,\n",
      "            \"C\" :1.4280989721539988e-17,\n",
      "            \"G\" :0.0060865515333644584,\n",
      "            \"T\" :0.9009639984757188\n",
      "        }\n",
      "    ],\n",
      "    \"frozen\" :false\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "result = \"\\n\".join( \"{}: {}\".format( state.name, state.distribution ) \n",
    "\tfor state in tiedmodel.states if not state.is_silent() )\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that states B1 and B2 are the same after training with tied states, not so without tied states."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
